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Abstract 

Non-negative least squares (NNLS) is one of the most fundamental problems in 
numeric analysis. It has been widely used in scientific computation and data modeling. 
When applying to huge and complex data in big data analytics, the limitations of NNLS 
algorithm speed and accuracy remain as key challenges. In this paper, we develop a 
fast and robust anti-lopsided algorithm for NNLS with high accuracy that is totally 
based on the first order methods. The main idea of the proposed algorithm is to 
transform the original NNLS into an equivalent non-negative quadratic programming, 
which significantly reduces the scaling problem of variables to discover more effective 
gradient directions. The proposed algorithm reaches high accuracy and fast speed with 
linear convergence (1— 9 ||q||^ ) k hi the subspace of passive variables where y/n < 11Q11 2 < 
n, and n is the dimension size of solutions. The experiments on large matrices clearly 
show the high performance of the proposed algorithm in comparison to the state-of- 
the-art algorithms. 

Keywords: Non-negative least squares, Large-scale anti-lopsided algorithm, First 
order methods. 
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1 Introduction 


Non-negative least squares (NNLS) problem that aims to minimize the sum of squares of the 
errors is one of the most fundamental problems in numeric analysis. It has been widely used 
in scientific computation and data mining to approximate observations [CP09]. Particu¬ 
larly, in many fields such as image processing, computer vision, text mining, environmetrics, 
chemometrics, or speech recognition, we often need to estimate a large number of observa¬ 
tions b G lZ d by a set of measurements or basis factors {A} contained in a matrix A G lZ dxn , 
where the popular task is to approximate b by minimizing |||Zhr — £>| ||. Moreover, in com¬ 
parison to least squares (LS), NNLS has more concisely interpretable solutions, of which 
non-negative coefficients x G 'RA can be interpreted as contributions of the measurements 
over the observations. In the contrast, mixed-sign coefficients of LS solutions are uninter¬ 
pretable because they lead to overlapping and mutual elimination of the measurements. 

Since 1990s, the methods of nonnegative matrix or tensor factorizations have widely used 
NNLS to achieve a low-rank representation of nonnegative data [KB09, Zhall]. Particularly, 
the low-rank representation transfers data instances into a lower-dimensional space of com¬ 
ponents or sources to obtain higher speed and accuracy, and more concise interpret-ability 
of data processing that are essential in applications of signal and image processing, machine 
learning, and data mining [CP09]. 

Gradient methods are usually employed to solve NNLS because there is not any general 
formula of solutions, although NNLS is a convex optimization problem having a unique so¬ 
lution. As other gradient methods, the performance of NNLS algorithms mainly depends on 
selecting appropriate directions to optimize the objective function. To do this, most effective 
algorithms remove redundant variables based on the concept of active sets [BDJ97, CP09] 
in each iteration, although they are different in the employed strategies [CP09]. Fundamen¬ 
tally, these algorithms are based on the observation that several variables can be ignored 
if they are negative when the problem is unconstrained [BDJ97, LH74, VBK04], In other 
words, NNLS can be considered as an unconstrained problem in a subspace of several vari¬ 
ables [KDD13] that are positive in the optimal solution. In addition, algorithms using the 
second derivative [BDJ97, LH74, VBK04] discover effective directions that more effectively 
reduce the objective function. However, these approaches have two main drawbacks: invert- 
ibility of A T A and its heavy computation, especially for the methods recomputing ( A T A 
several times for different passive sets. Hence, other algorithms using only the first deriva¬ 
tive [FHN05, KDD13, Potl2] can be more effective for large-scale least squares problems. 

In our view, to discover more appropriate gradient directions is critical to enhance the 
performance of NNLS algorithms. In this paper, we propose a fast robust iterative algorithm 
to solve that issue called anti-lopsided algorithm. The main idea of this algorithm is to 
transfer the original problem into an equivalent problem, by which scaling problems in the 
first order methods are significantly reduced to obtain more effective gradient directions. 
The proposed algorithm has significant advantages: 

• Fastness: The proposed algorithm attains linear convergence rate of (1 — ^jy^yjy)in 
the subspace of passive variables, where y/n < HQIb < n and n is the dimension size of 
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solutions. In addition, it does not require computing the inverse of matrices (A T A) 1 
because it is totally based on the first derivative. 

• Robustness: It can stably work in ill-conditioned cases since it is totally based on 
the first derivative, 

• Effectiveness: The experimental results are highly comparable with the state-the-art 
methods, 

• Convenience: Interestingly, the method is convenient to be implemented, paralleled 
and distributed since it is totally based on the first order derivative and gradient 
methods using the exact line search. 

The rest of paper is organized as follows. Section 2 discusses the background and related 
work of least square problems. Section 3 mentions the details of our proposed algorithm anti¬ 
lopsided. Subsequently, the theoretical analysis is discussed in Section 3. Finally, Section 5 
shows the experimental results, and Section 6 summarizes the main contributions of this 
paper. 


2 Background and Related Work 

In this section, we introduce the non-negative least square (NNLS) problem, its equivalent 
problem as well as non-negative quadratic problem (NQP), and significant milestones of the 
algorithmic development for NNLS. 

2.1 Background 

Non-negative least square (NNLS) can be considered one of the most central problems in 
data modeling to estimate the parameters of models for describing the data [CP09]. It comes 
from scientific applications where we need to estimate a large number of vector observations 
b G lZ d by a set of measures or basis factors {A} contained in a matrix A G TZ dxn . The 
common task is to approximate an observation b by the measures by minimizing |||Ax — b\\\. 
Hence, we can define NNLS as follows: 

Definition 1. Given n measurement vectors A = [Ai, A 2 , ...,H n ] G 7Z dxn and an observed 
vector b G lZ d , find an optimal solution x of the optimization problem: 

minimize -||Ae — b\^ 

x 2 

subject to x y 0 d) 

where A G TZ dxn , b G lZ d 
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Obviously, it can be equivalently turned into a non-negative quadratic programming 
(NQP) problem: 


minimize 

X 

f(x ) = -x T Hx + h T x 

(2) 

subject to 

x y 0. 

where 

H = A T A, h = -A T b 



From this NQP formulation, these problems are convex since Q is positive semidehnite 
and the non-negativity constraints form a convex feasible set. In this paper, we will solve 
Problem 2 instead of Problem 1. 


2.2 Related Work 

In last several decades of development, many different approaches have been proposed to 
tackle the NNLS problem, which can be divided into two main groups: active-set methods 
and iterative methods [CP09]. 

Active-set methods are based on the observation that variables can be divided into sub¬ 
sets: active and passive variables [GMW87]. Particularly, the active set contains variables 
being zero or negative when solving the least square problem without concerning non-negative 
constraints, otherwise the other variables belongs to the passive set. The active-set algo¬ 
rithms employ the fact that if the active set is identified, the values of the passive variables 
in NNLS are their values in the unconstrained least squares solution when removing active 
variables that will be set to zero. Unfortunately, these sets are unknown in advance. Hence, 
a number of iterations is employed to find out the passive set, each of which needs to solve 
a unconstrained least squares problem on the passive set to update the passive set and the 
active set. Concerning the significant milestones of the active-set methods, Lawson and Han¬ 
son (1974) [LH74] proposed a standard algorithm for active-set methods. Subsequently, Bro 
and De Jong (1997) [BDJ97] avoided unnecessary re-computations on multiple right-hand 
sides to speed up the basic algorithm [LH74], Finally, Dax (1991) [Dax91] proposed selecting 
a good starting point by Gauss-Seidel iterations and moving away from a “dead point” to 
reduce the number of iterations. 

On the other hand, the iterative methods use the first order gradient on the active set 
to handle multiple active constraints in each iteration, while the active-set methods only 
handle one active constraint [CP09]. Hence, the iterative methods can deal with larger- 
scale problems [KSD06, KDD13] than the active-set methods. However, they are still not 
guaranteed the convergence rate. More recently, accelerated methods [Nes83] and proximal 
methods [PB13] having a fast convergence 0(1/k 2 ) [GTLY12] only require the first order 
derivative. However, one major disadvantage of the accelerated methods is that they require 
a big number of iterations when the solution requires high accuracy because the step size is 
limited by jj that is usually small for large-scale NNLS problems with big matrices; where 
M is Lipschitz constant. 

In summary, active-set methods and iterative methods are two major approaches in 
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Figure 1: 40 optimizing steps in the iterative exact line search method using the 
first order derivative for the function 3 starting at xq = [30 2] T 


solving NNLS. The active-set methods accurately solve nonnegative least square problem, 
but require a huge computation for solving unconstrained least squares problems and are 
unstable when A T A is ill-conditioned. The iterative methods have more potential in solving 
large-scale NNLS because they can handle multiple active constraints per each iteration. In 
our view, the iterative methods are still ineffective due to the scaling problem on variables 
that seriously affects the finding of appropriate gradient directions. Therefore, we propose 
the anti-lopsided algorithm as an iterative method that re-scales variables to reduce the 
scaling problem, obtain appropriate gradient directions, and achieve linear convergence on 
the sub-space of passive variables. 


3 Anti-lopsided algorithm 

In this section, we discuss our new idea to develop a fast and robust iterative algorithm 
for large-scale NNLS problem. For more readability, we separate the proposed algorithm 
into two module algorithms: Algorithm 1 for solving NNLS problem by first transforming 
it into a NQP problem and re-scaling variables by which the scaling problem of variables 
is significantly reduced, then calling Algorithm 2 for solving NQP problem by a first-order 
gradient method using the exact line search. 

It is well known from the literature that iterative methods using the first order derivative 
can be suitable for designing large-scale algorithms for NNLS. However, this approach heavily 
depends on the scaling of variables [BV04], For example, if we employ the iterative exact 
line search method using the first order derivative to optimize the function 3, we need 40 
iterations to reach the optimal solution (see Figure 1): 

x + [—4 — 5]x (3) 

Hence, we re-scale variables into a new space, in which new variables have more balanced 
roles and for that reason we call our algorithm as anti-lopsided algorithm. Particularly, we 
re-scale: 



x = 


y 

^diag(H) 


or 



Vi 


(4) 
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Algorithm 1: Anti-lopsided algorithms for NNLS 

Input: A G TZ dxn ; b G lZ d 
Output: x minimizing \\\Ax — b\\\ 
subject to: x >z 0 


1 begin 

2 H = 

3 Q — 


A T A- 

_ H _. 

yj diag(H)diag(H) T 1 


4 

5 

6 

7 


^ \Jdiag(H) ’ 

y = solveNQP(Q, g)/*by Algorithm 2*/; 
x = y 

\jdiag(H) ’ 

return x 


It is noticeable that = AfAi = ||Aj||| > 0. For the special case Ha = 0, we give 
Xj = ?/j. After re-scaling variables, the original Problem 2 is equivalently transformed into 
NQP Problem 5: 


minimize 

y 

f(y) = 

= 2 y T Qy + y 


subject to 

yh o 

Hij ' hi 

m — tt — 1 / tj 

(5) 

where 

Qij 



Remark 1. Consider the values of matrix Q, we have: 


® Qii 



cos (Aj, Aj) 


1, V z = 1 ,n 


• Qij = ~^§~h~ = cos (^’ v 1 < J < n 

Obviously, the scaling problem of variables is significantly reduced because the roles of 
variables in the function become more balanced. The shape of function is transformed from 
ellipse-shaped (see Figure 1) to ball-shaped (see Figure 2). Hence, the first order methods 
can much more effectively work. For example, we need only 3 iterations instead of 40 
iterations to reach the optimal solution for the function 3 with the same initial point y 0 , 
where y 0i = x 0i .Ha, Vi (see Figure 2). 

Subsequently, we discuss on Algorithm 2 for solving NQP with input Q and q. For each 
iteration, the objective function is optimized on the passive set: 


P{x) = {xi\xi > 0 or V/j (x) < 0} 

Hence, the first order gradient will be projected into the sub-space of the passive set 
Vf — [V/]p( x) (V/j = V/j for * G P(x), otherwise V/j = 0). Noticeably, the passive set 
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Figure 2: 3 optimizing steps in the iterative exact line search method using the first 
order derivative after applying anti-lopsided steps starting at yo = xq. * ^diag(H) 


can change through iterations, and Algorithm 2 is converged when P(x) = 0 or 11 V/| ( 2 < e. 
In addition, the orthogonal projection on the sub-space of passive variables x = [xk + a\7 f] + 
is trivial [KDDI 3] since NQP problem 5 is a strongly convex problem on a convex set. 


Algorithm 2 : Gradient Method using Exact Line Search for NQP 

Input: Q G TZ nxn ] q G lZ n 

Output: x minimizing fix) = | x T Qx + q T x 



subject to: x >z 0 

l begin 

2 

X 

° = 0; 

3 

< 

55 

4 

repeat 

5 


/* Remove active variables and keep passive variables*/; 

6 


V/ = V/[z > 0 or V/ < 0]; 

7 


a - argmm af (x k a V/) - J | T q!/ f ; 

8 


x fc+ i = [x fe - aV/] + ; 

9 


V/ = V/ + Q(Afc+i - x fc ); 

10 

until P{x ) = 0 or V/ 2 < e; 

11 

return Xk 





Regarding computing a of Algorithm 2, we have: 

a 2 

f(x - aWf) = -aV^Qx + q] + —WfQVf + C 

di = vf T (Qx + q) = i \vm 

da V f T QV f Vf T QX7f 

where C is constant 
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4 Theoretical Analysis 

In this section, we investigate the convergence and complexity of the proposed algorithm. 


4.1 Convergence 

Concerning the convergence rate, our method argues Barzilai and Borwein’s note that NNLS 
is an unconstrained optimization on the passive set of variables [BDJ97]. Moreover, the 
orthogonal projection on the sub-space of passive variables x = [x^-t-aV/]+ is trivial [KDD13] 
since NNLS and its equivalent problem (NQP) is a strongly convex problem on a convex set. 
Hence, in this section, we consider the convergence rate of Algorithm 2 in the sub-space of 
passive variables. 

Let /(x) = \x T Qx + q T x, where Qij = cos(Aj, A,) = cos(di, af) and where a*, = is 

the unit vector Ak, V/c. 

Since /(x) is strongly convex, we have: 

• 3m, M > 0 satisfy ml -< V 2 / r< MI 


• Vx,j/ : f(y) > /(x) + (V/(x),(y-x)) + f\\y-x\\ 2 

• Vx,y : f(y) < f(x) + (V/(x),(y-x)> + f \\y - x\\ 2 
Based on [CJ12], we have: 

Theorem 4.1. After (k + 1) iterations, f(x k+1 ) — f* < (1 — ||) fe (/(x 0 ) — /*), where f* is 
the minimum value of f(x). 

Proof. Since f(y ) < /(x) + (V/, y — x) + ^\ \y — x|| 2 Vx, y selecting y = x — -^V/ and x + is 

the updated value of x after an iteration by the first order gradient using exact line search, 

we have: 

/u+) </(* - ^v/i < /<*) - i\wm + (6) 

</M - 2^I|V/||| 

Hence, for the minimum value f* of the objective function, we have: 


f(x k + i) ~ f* < {f{x k ) ~ f“) 



(7) 


Consider f(y) = f{x) + (V/, y — x) + y | |t/ — x| || (fixing x) is a convex quadratic function 
of y. Hence, /(?/) minimizes when V/(y) = 0 y = y = x — yV/. In addition, since 
f(y) > f(x) + (Vf,y- x) + f ||r/ - x||| Vx,y, we have: 


/(?/) >/(a) + (V/, 2 / - x) + —| 
=/(^) - ^Il v /ll2 Vx,y 
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Selecting y = x* and x = Xk where x* is the optimal solution, we have: 


— IIV/H 2 < 2m(f* — f(xk)) 


From (7) and (9), we have the necessary result: 

7Y1 7Y1 

h +1 -/*<(i- - D < (i - - /') 


( 9 ) 


Lemma 4.2. For f(x) = \x T Qx + q T x 1 then A V 2 / A ||(2|| 2 / and ■>Jn < 11<5|I 2 < n, 
where Qij = cos(ai,aj), a, , a 3 are unit vectors, and 11<51|a — 1 Xq =1 Qlj- 

Proof. We have V 2 / = Q, and 

\x T Ix < + YJi=i x i a i\\ 2 = Ya= 1 YJj =1 Qi.i-i'i-i-j = x T Qx for Vx => \l r< V 2 /. 

Moreover, based on Cauchy-Schwarz inequality, we have: 


i= 1 j=l i=l j =1 2=1 j=l 


£ £ Cl.rF-0 < 

*=1 3 =1 


N 


IIOIIKE 


X. 


2Y2 


i =1 


^x t Qx < \\Q\\ 2 x T Ix (Vx) <=*► Q ||<2|| 2 / 


Finally, Cn = y/Y%=iQu < IIQII2 = E"=i < Vn 2 = n since -1 < Q i;j = 

cos(aj, cij) <1. ■ 

From Theorem 4.1 and Lemma 4.2, setting m = | and M = ||Q|| 2 , we have: 

Theorem 4.3. After fc + 1 iterations, f(x k+1 ) — /(x*) < (1 — 9 ||q||.. ) k (f( x °) ~ /(x*)), where 
Vn < HQH 2 < n and n is the dimension of x. 


4.2 Complexity 

Concerning the average complexity of Algorithm 2, we consider the important operators: 

• Line 7: The complexity of computing 11 V/||| is 0(n), and that of V/QV/ is 0(nS(n)). 

• Line 9: The complexity of computing Q(xk +1 — a:*) is C(nS'(n)). 

where S(n) is the average number of non-zero elements of V/. Noticeably, the sparsity of 
(xk +1 — Xk) equals to the sparsity of V/ since Xk +1 = x*, + aV/. 

Lemma 4.4. The average complexity of Algorithm 2 is 0(knS(n )), where k is the average 
number of iterations. 
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Therefore, based on Lemma 4.4, if we consider computing A T A and A T b in 0(dn + dn 2 ), 
we have: 

Theorem 4.5. The average complexity of Algorithm 1 is 0(dn + dn 2 + knS(n)), where k 
is the average number of iterations and S{n) is the average number of non-zero elements of 

v/. 

5 Experimental Evaluation 

In this section, we investigate the convergence speed, running time and optimality of the 
proposed algorithm in comparison to state-of-the-art algorithms belongs different research 
directions: active-set methods, iterative methods and accelerated methods. Particularly, we 
compare our algorithm Antilop with the following algorithms: 

• Fast: This is a modified effective version of active-set methods according to Bro R., 
de Jong S., Journal of Chemometrics, 1997 [BDJ97], which is developed by S. Gunn 1 . 
This algorithm can be considered as one of the fastest algorithms of active-set methods. 

• Nm: This is a non-monotonic fast method for large-scale nonnegative least squares 
based on iterative methods [KDD13]. The source code is downloaded from 2 . 

• Accer: This is a Nesterov accelerated method with convergence rate 0(1/7c 2 ) [GTLY12] . 
The source code is extracted from a module in the paper [GTLY12], The source code 
is downloaded from 3 . 

• Anti+Acc: This is a Nesterov accelerated method adding the anti-lopsided steps 
(from Line 2 to Line 4 in Algorithm 1). This evaluation investigates the effectiveness 
of the anti-lopsided steps for algorithms that only use the first derivative. 

To be a fair comparison, we evaluate the five algorithms on various test-cases which are 
randomly generated. Particularly, we use a set of test-cases with n = 4000 and d = 1.5n = 
6000 since usually d > n. Furthermore, we design 6 test-cases {Tl, ..., T6}, which have high 
potential to happen in practice. To generate test-cases having nontrivial solutions, matrix 
A and a vector x* are randomly generated, and b = Ax*. The sign of values in A and x* can 
be non-negative(+) or mixed-sign(±); and measure vectors {A l }f =] have the same (SAM), 
randomized (RAN), or various (VAR) lengths, which are enumerated in Table 1. Test-cases 
containing A, x* negative are ignored because they will have x = 0 as the optimal solution. 
Concerning the optimal value /* to evaluate, hence, it is 0 for nonnegative test cases; and 
it is the best minimum value found by all the methods for mixed-sign test cases because we 
do not know it for the test-cases in advance. 

1 http://web.mit.edu/ mkgray/matlab/svm/fnnls.m 

2 http: / / suvrit.de / work / progs / nnls.html 

3 https: / / sites.google.com / site / nmfsolvers/ 
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Table 1: Summery of the kinds of test-cases 



T1 

T2 

T3 

T4 

T5 

T6 

A,x* 

+ 

± 

+ 

± 

+ 

± 

Length of A^ 

SAM 

RAN 

VAR 

SAM 

RAN 

VAR 


Table 2: Range of 11^4.| 1 2 and ||A T A||| in test cases 



u\\l 

IIAU4HI 

T1 

1.8e+03 - 4.7e+06 

l.le+05 - 1.4e+13 

T2 

1.7e+03 - 3.5e+06 

9.4e+04 - 3.9e+ll 

T3 

2.4e+03 - 3.1e+06 

5.6e+05 - 6.0e+12 

T4 

2.3e+03 - 3.9e+06 

1.6e+05 - 7.5e+12 

T5 

2.6e+03 - 2.9e+06 

2.1e+05 - 5.1e+12 

T6 

9.9e+02 - 5.0e+06 

3.1e+04 - 1.5e+13 


For each test-case, we randomize 5 sub-tests having different sparsity of vectors A^ and 
x* from 0% to 40% to create the objective functions having different values of Lipschitz 
constant and various cases of solution (see Table 2). 

Since the algorithms can run for a long time, we must stop them when no finding better 
solution with a smaller value of ||V/|| 2 , reaching the maximum number of iterations k = 
5 n = 2.10 4 , or the maximum of running time is 1600(s) that is the running time of the 
traditional accurate method Fast [BDJ97] for all the test-cases. 

In addition, we develop Algorithm 1 and Algorithm 2 in Matlab to easily compare them 
with other algorithms. Furthermore, we set system parameters to use only 1 CPU for Matlab 
and the 10 time is excluded. All source codes of our algorithm and generating test-cases, 
and 30 used test-cases are published 4 . 

5.1 Convergence 

In this section, we investigate the convergence speed of the square of derivatives ||/||| —> 0 
(see Figure 3) and the difference between the values of objective function and the optimal 
values (/(xfc) — f* + 1) —> 10° during the running time (see Figure 4). The results clearly 
show that our proposed algorithm and algorithm Fast [BDJ97] converge to the optimal val¬ 
ues in all 30 test-cases. Remarkably, our algorithm converges to the optimal values much 
faster than the other methods. Moreover, interestingly, algorithm Anti-pAcc works more 
effectively than algorithm Accer. This proves that the anti-lopsided transformation makes 
the convergence of iterative methods using the first derivative more faster because the trans¬ 
formation significantly reduced the scaling problem of variables to obtain more appropriate 
gradient directions. 

4 https: //bitbucket.org/[will-publish]/antilopsidednnls 
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Figure 3: \\fW 2 during running time 



Figure 4: (/(xfc) — /* + !) during running time 
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Table 3: Average of running time in seconds for test-cases Tf, ...,Tq 


Test-case Antilop 

Fast 

Nm 

Accer 

Anti+Acc 

T1 

146.8 

1384.2 

205.2 

1600.0 

1600.0 

T2 

11.9 

160.6 

2.7 

253.4 

166.7 

T3 

71.5 

1471.9 

917.6 

1600.0 

1600.0 

T4 

12.5 

255.6 

906.4 

1600.0 

138.5 

T5 

154.7 

1455.3 

905.3 

1600.0 

1600.0 

T6 

12.3 

265.7 

906.9 

1600.0 

276.2 


Table 4: Average of \f(xk ) 

— /* in test-cases 

T\, ...,T 6 

Test- 

-case Antilop 

Fast 

Nm 

Accer 

Anti+Acc 

T1 

7E-15 

2E-15 

2E+03 

2E-03 

2E-03 

T2 

6E-08 

7E-08 

6E-08 

7E-08 

IE-07 

T3 

6E-16 

2E-16 

2E-01 

IE-01 

2E-04 

T4 

9E-09 

8E-09 

6E+03 

2E+03 

IE-10 

T5 

3E-09 

9E-10 

5E+05 

5E+05 

1E+03 

T6 

6E-03 

4E-03 

7E+09 

1E+09 

6E-04 


5.2 Running Time 

In the final comparison, we compare the algorithms in the average of running time on 5 
sub-tests of test-cases as shown in Table 3. Clearly, our proposed algorithm has the best 
averages of running time in most of test-cases, excepted for test-case T2. However, the non¬ 
monotonic algorithm (Nm) [KDD13] is unstable in other kinds of test-cases. Furthermore, 
the difference between our proposed algorithm and the non-monotonic algorithm for test- 
case T2 is not considerable. In comparison to the most stable algorithm Fast [BDJ97], our 
algorithm has the similar accuracy of optimal values, but it runs from 9 times to 21 times 
faster than algorithm Fast [BDJ97]. 

5.3 Optimality 

This section is dedicated to investigate the optimal effectiveness of the compared algorithms. 
The results shown in Table 4 are the average values \f(xk) — f*\ of 5 sub-test cases for 6 test- 
cases. Obviously, our algorithm and Fast [BDJ97] best values that are highly distinguished 
from the other methods. All the methods have errors | f{x^) — f*\ that can be raised by ap¬ 
proximately computing float numbers in computers and characteristics of iterative methods. 
Hence, the minor differences between the proposed algorithm’s results and the best results 
can be acceptable for the size of large matrix d x n, and the big values of Lipschitz constant 
related to ||A|| 2 . 
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6 Conclusion and Discussion 


In the paper, we proposed a fast robust anti-lopsided algorithm to solve the nonnegative 
least squares problem that is one of the most fundamental problem in data modeling. We 
theoretically proved that our algorithm has linear convergence (1 — 2 \\Q\\ 2 ) k 011 ^ ie sub-space 
of passive variables, where y/n < ||Q ||2 < n, and n is the dimension of solutions. 

In addition, we carefully compare the proposed algorithm with state-of-the-art algorithms 
in different major research directions on large matrices about three aspects: convergence rate, 
running time and optimality of solutions on 30 randomized tests among 6 different test- 
cases which often occur in practice. On convergence rate and running time, the proposed 
algorithm’s results are highly distinct from other ones’ results. On the optimality of solutions, 
the results of our algorithm are very close to the most accurate results. These differences 
are not significant and inevitable because they are raised by computing approximately float 
numbers in computers and the characteristics of iterative methods using many float operators 
on the large-scale matrices. Therefore, it can be said that the proposed algorithm obtains 
all of three significant aspects on convergence speed, running time and accuracy. 

Finally, the convergence speed of the Nesterov accelerated method [GTLY12] using the 
anti-lopsided steps is much more faster than Nesterov accelerated method directly applied. 
Hence, we strongly believe that the anti-lopsided steps can have a significant impact on 
iterative methods using the first derivative for solving least squares problems and related 
quadratic programming problems. 
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